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Abstract 

We consider the problem of the evolution of a code within a structured population of 
agents. The agents try to maximise their information about their environment by acquiring 
information from the outputs of other agents in the population. A naive use of information- 
theoretic methods would assume that every agent knows how to “interpret” the information 
offered by other agents. However, this assumes that one “knows” which other agents one 
observes, and thus which code they use. In our model, however, we wish to preclude that: 
it is not clear which other agents an agent is observing, and the resulting usable information 
is therefore influenced by the universality of the code used and by which agents an agent is 
“listening” to. We further investigate whether an agent who does not directly perceive the 
environment can distinguish states by observing other agents’ outputs. For this purpose, we 
consider a population of different types of agents “talking” about different concepts, and try 
to extract new ones by considering their outputs only. 
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1 Introduction 


If we consider organisms capable of processing information, then we can argue that they must 
be able to internally assign meaning to the symbols they perceive in a code-based manner |10j . 
For instance, bacteria perceives chemical molecules in their environment and interprets them 
in order to better estimate environmental conditions and (stochastically) decide their phenotype 
[Ml HIM HZ]- Plants detect airborne signals released by other plants, being able to interpret them 
as attacks of pathogens or herbivores Therefore, a correspondence between environmental 

conditions and chemical molecules must be established. It is in this way that Barbieri characterises 
codes, and he proposes three fundamental characteristics for them: they connect two independent 
worlds; they add meaning to information; and they are community rules |2]. 

Codes connect two independent worlds by establishing a correspondence or mapping between 
them. These worlds are independent and thus there are no material constraints for establishing 
arbitrary mappings. The meaning of information comes exclusively from the mapping: symbols by 
themselves are meaningless. Finally, the third property requires that the correspondence between 
the two worlds constitutes an integrated system. 

For instance, human languages establish a correspondence between words and objects |2]; 
in bacteria it is between chemical molecules and environmental and social conditions |351 136] . 
Words (or chemical molecules) by themselves do not have any meaning, and each individual of 
a population can define, arbitrarily to some extent, their own set with its mapping. However, 
populations of individuals sharing the same code are ubiquitous in nature. How is it that codes 
come to be shared by many individuals when their constitution involve arbitrary choices for each 
individual? This question is what we are investigating in the present paper. 

For this work, we assume a simple scenario where organisms live in a fluctuating environment. 
If they can perfectly predict the future environmental conditions, they can prepare themselves 
by adopting a proper phenotype, and, therefore, survive. However, when uncertainty about the 
environment remains, organisms will follow a bet-hedging strategy m [28] . where they try to 
maximise their long-term growth rate by adopting the phenotype that matches the environment 
in proportions based on the information they have about it. For example, seeds of annual plants 
germinate stochastically in different periods in relation to the probability of rainfalls, and their 
chances of survival are maximised when they match this probability |S]. 

The relation between information and long-term growth rate can be expressed elegantly in 
information theoretic terms, where an increase in the environmental information of an organism 
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is translated into an increase in its long-term growth rate [MlIIIllIHlIIlEe]. Such models achieve 
the maximisation of the long-term growth rate by maximising an organism’s information about 
the environment. If we assume this behaviour in organisms, then those obtaining additional 
environmental information (other than that from their sensors, which we assume it does not 
completely eliminate environmental uncertainty) from other individuals will have an advantage 
over those that do not, since they would be able to better predict the future conditions. However, 
for individuals to be able to communicate with each other, they must be able to translate symbols 
into environmental conditions, where the output of these symbols results from an individual’s 
code. We consider the code of an individual as a stochastic mapping from its sensors states to a 
set of outputs. 

For this study, we consider outputs (or messages) of individuals (or agents) as conventional 
signs. In semiotics, the science of all processes in which signs are originated, stored, communicated, 
and being effective m, two types of signs are traditionally recognised: conventional signs and 
natural signs [7]. In conventional signs there is no physical constraint on the possible mappings, 
they are established by conventions. Although in physical systems there can be limitations to the 
possible mappings that can be implemented, in this work we assume complete freedom of choice. 
On the other hand, in natural signs, there is always a physical link between the signifier and 
signified, such as smoke as a sign of fire, odours as signs of food, etc. [3]. 

In this work, we are not interested in the particular detailed mechanisms by which an agent 
implements its code, nor how the agent decodes the outputs of other agents. Instead, we focus on 
the theoretical limits on the amount of environmental information an agent can possibly acquire 
resulting from different scenarios of population structure and codes distribution. The natural 
framework to analyse such quantities is information theory |30] . However, it does not take semantic 
aspects into account, it only deals with frequencies of symbols instead of what they symbolise. 
Codes, on the other hand, add meaning to information, which makes the integration of sciences 
such as semiotics with information theory non-trivial In the following section, we present 

an information-theoretic model which incorporates the necessity of conventions by dropping from 
the model the usual implicit assumption of knowing the identity of the communicating units. 

2 Model 

To introduce the model in a progressive manner, let us first consider three agents, 0i, 02 and 
O 3 . Each of these agents depend on the same environmental conditions for survival, which are 
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modelled by a random variable /i. Agents acquire information about the environment through 
their sensors, which are modelled by random variables 1^1, Y92 ^^-d all three conditioned on 
fi, for agents 9 i, 62 and @3, respectively. We assume each agent acquires the same amount and 
aspects of environmental information from p,, i.e. p{Yg^\^) = p{Y0^\p) = ^(1^3 1 ^). Let us further 
assume that the information each agent acquires about the environment does not eliminate its 
uncertainty, i.e. H{pL\Yo^) > 0 for 1 < i < 3 . The code of an agent is a stochastic mapping 
from its sensor states into a set of outputs, and is represented by the conditional probabilities 
p{Xe^\YgJ, p{XgjYg^) and p{Xg^\Yg^) for agents 0 i, 02 and 63, respectively (see Fig. [^. 



> 0 . Ye, 


Xe, Xg, Xg, 


Figure 1: Bayesian network representing the relantionship between the sensor and output variables of 
three agents. 

Let us assume that agent 61 perceives only the outputs of agents 62 and 63. One possible 
way of computing the information about the environment agent 9 i has is to consider the mutual 
information between p. and the joint distribution of the sensor of di and the outputs of 02 and 
03 '. I[p]Yg,,Xg,,Xg,). However, by writing down this quantity, we are implicitly assuming that 
agent 0 i “knows” which output corresponds to 02 and which output corresponds to 03. Therefore, 
in this consideration, an agent can theoretically do the translations of the outputs according to 
some internal model of other agents and infer the mentioned amount of information about its 
environment. 

2.1 Indistinguishable sources of messages 

For this study, on the contrary, we consider an agent observing other agents’ messages, but under 
the assumption that the originator of a message cannot be identified. In this way, the total amount 
of information an agent can infer from the outputs of other agents will depend on to which extent 
it either can identify who the other agents are or can rely on them using a coding scheme that 
does not depend too much on their particular identity. For instance, if agents 02 and 03 both 
agree on the output for each of the environmental conditions, then agent 0i should be able to infer 
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more environmental information than if they disagree on the output for each of the environmental 
conditions, given that agent 9i does not know which of the agents it is observing. 

To model this idea, let us assume a random variable 0' denoting the selected agent. This agent 
depends on the same environmental conditions for survival as 0 i, which are modelled, as above, 
by a random variable /r. Agents acquire information about the environment through their sensors, 
which are modelled by a random variable Yqi conditioned on the index variable denoting the agent 
under consideration, 0', and /i. The amount of acquired sensory information of a specific agent 
9' about /i is given by /(/r; Yg'). As above, the code of an agent is a stochastic mapping from its 
sensor states into a set of messages, and is represented by the conditional probability p{Xe'\Ye<) 
for an agent 9' (see Fig. [^. 





Xsi Xqi ■* - 0 ' 

Figure 2: Bayesian network representing the relationships as described above (see text). 


However, now we want to model the fact that we do not know which agent is observed. In the 
case with maximum uncertainty, 0 is uniformly distributed, and then this parametrisation of the 
codes considers the outputs of all agents in 0 ' altogether, such that if we are not observing 0 ', we 
cannot identify whose agent’s output we are observing. In Eq. [^and Eq. I^we show two examples 
of codes for agents 6*2 and 9^, while their sensor states are define by the Eq. (Eq. [^defines the 
sensors states of agent 9i). We compute how much information about the environment there is 
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If we assume p( 02 ) = p{93) = 1/2, and p{pi) = p{p2) = 1/2 and e = 0.01, then if we consider 
the codes shown in Eq. we have that I{p; Yg ^, Xqi ) = 0.97872 bits, where 0' consists of agents 02 
and 03 . However, had 02 and 03 “opposite” codes as shown in Eq. then I{p\Yg^,XQi) = 0.9192 
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bits, which is exactly I{p.-,Yg-^), that is, /(/i; Xe/lYgJ = 0 bits (agent 9 i cannot acquire any side 
information from the outputs of agents O2 and 9 ^). We should note here that the sensor states 
2/1 and 2/2 of agents 02 and 03 in the conditional probability shown in Eq. and refer almost 
deterministically to the same environmental condition, and therefore the loss of side information 
is thus entirely due to the incompatible codes. The conditional probabilities of sensor states given 
the environmental conditions further defined throughout the paper are also assumed to be almost 
deterministically. 


2.2 Environmental information of a population 

The model shown in Fig. [^considers the environmental information of agent 0 i, ignoring its 
own output Xgj. Nevertheless, agents ignoring their outputs is contrary to our assumption over 
the incapability of agents to identify the sources of the outputs. On the other hand, we are 
assuming a specific type of communication, one which could be classified as persistent within 
the different classifications of stigmergy ([sa Ea na , see M for a summary). To incorporate 
this option in the model shown in Fig. we could consider the state space of 0 ' as the set 
{01,02,03}. Then, to express not only the environmental information of agent 0 i, but the average 
environmental information of the whole population, we can parametrise the agent by a random 
variable 0 (defined over the same state space, representing the same set of agents as O'), such 
that p(Y0l/j,, O) = p(Y0/lp,, O') (i.e., Yq, is i.i.d. to Y0, and vice versa). 


le Te' 

0 Xe- -0' 

Figure 3: Bayesian network representing the sensor variables of a set of agents indexed by the random 
variable 0, and the sensor and output variables of a copy of the set of agents indexed by 0 named O'. 
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In this way, the average environmental information of a population of the agents selected by 
0 is given by Xq,) (see Fig. This measure can be consider as the objective function 

to maximise in our model. However, we would be making two important assumptions: first, 
this objective function assumes agents have access to the environmental conditions /i, which they 
indirectly do but only through their sensors; and second, every agent would perceive the output of 
every other agent, including itself. In this work, instead, we propose that agents follow a behaviour 
such that it maximises the similarity of their outputs (via their codes) with those of which the 
agent perceives. A consequence of this behaviour is that the average information about fj, is also 
maximised. In addition, we will introduce a potentially flexible “population structure”, so that 
we can specify which agents interact with which. 

2.3 Code similarity 

First, we introduce a copy of the codes of the agents, such that when we instantiate the variables 
Xq and Xqi, the probabilities are the same. The structure of the population is then given by 
p(0,0') = p(0)p(Q'). However, the conditional independence of 0 and 0' restricts significantly 
the diversity of the structures that can be represented. In such cases, the agents selected by 
0 perceive the outputs of all the agents selected by 0' and vice versa. In order to model a 
general interaction structure between agents, we consider p(0, 0^) not independent, as shown in 
the Bayesian network in Fig. where we introduce a helper variable S. This allows different 
agents selected by 0 to perceive outputs from exclusive agents selected by 0'. 





Fe Fe' 

0 - *■ Xe X0I ■* - 0 ' 


Figure 4: Bayesian network representing the relantionship of the variables in the model of code 
Yq! is an i.i.d copy of Ve and Xq' is an i.i.d. copy of Xq. 0' covers the same set of agents as 
probability distribution is not necessary the same. 


evolution. 
0, but its 
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We define the objective function as I{Xq]Xqi), that is the average code similarity of a pop¬ 
ulation of agents according to the population structure p(0, 0'). For instance, if the interaction 
probability of two agents is zero, then the similarity of the codes of these two agents is irrelevant 
for the objective function. On the other hand, they interact with probability bigger than zero 
(p(0. O') > 0, for some agents 0 and O'), then how similar their codes are will influence I{Xq; Xq/). 

If we consider our system as a process in time, then at each time-step two agents are chosen 
according to p(0, 0'). Agent 0 reads the output of agent 0' (generated via its code, which is i.i.d 
over time), and let us assume that it stores the pair (Yej Xqi), i.e. its current sensor state together 
with the perceived output. If this is repeated a large number of times, then the total amount of 
environmental information that can be inferred from the collected statistics by the population is 
bounded by /(/i; Ye, Ae/). This is the theoretical limit to which we refer in the introduction, and 
for this study we are not interested in how the inference is computed. However, we implicitly 
assume that agents decode the perceived outputs according to their codes. 

2.4 Distance between two codes 

In order to visualise the evolution of codes, we define the distance between the codes of two 
agents Oi and Oj as the square root of the Jensen-Shannon divergence [JHlIIl] between them. This 
measure has the property that 0 < JSD{0i,0j) < I when log 2 is used, and the square root yields 
a metric. Let us note that this distance requires the sensor states Y to be named identically (for 
the corresponding states of /r) among agents in order to be meaningful. As we stated above, this 
is (closely) the case in all our experiments. This requirement over the sensor states discards the 
possibility of using other measures such as mutual information. 


dist{0,,0^) = ^ JSD{p{Xe^\Ye^)\\p{Xg^\Ye^)) 

= ^^D{piXg^\YgJ\\p{Xo,\Ye,)) + ^Z?(p(A,jy,J|b(A,jY,J) 


(5) 


where p{Xe^ \Ye^) = I (p(AeJle,) -h p{Xe^ \Yg ^)). 
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3 Methods 


To illustrate the behaviour of our model, we consider four different scenarios, which are described 
in Sec. The common parameters for the first two experiments are the following: the population 
consists of 25 agents; the amount and quality of the acquired sensory information is the same for 
every agent, that is p(Fe|^) = p{Yqi |^). For the third scenario, the only difference is that we con¬ 
sider only 15 agents, since the dimensions to consider with a flexible structure grows quadratically 
with the number of agents. 

The optimisation algorithm used in the following experiments is CMA-ES (Covariance Matrix 
Adaptation Evolution Strategy), which is a stochastic derivative-free method for non-linear op¬ 
timisation problems m- We utilised the implementation provided by the Shark library vS.O.O 
with its default parameters, which implements the CMA-ES algorithm described in m- The 
evolutionary algorithm used for optimisation does not intend to represent the actual evolution 
of the codes. Instead, we are interested in the solutions of this optimisation process, which are 
representative of the possible outcomes of evolution. 

To visualise the evolution of the codes of the agents, we use the method of multidimensional 
scaling provided by R version 2.14.1 (2011-12-22). This method takes as input the distance matrix 
between codes, and plots them in a two-dimensional space preserving the distances as well as 
possible. To visualise, not only the distances between the resulting codes, but also how they relate 
to the distances between initial codes, we provide a distance matrix of both initial and resulting 
codes. The initial codes are randomly set by the evolutionary algorithm. 

4 Results 

In this section, we analyse the outcome of the four different scenarios where code similarity is 
maximised. While the outcomes are particular for one simulation, they are illustrative of the 
richness that the model is able to capture, which is described for each scenario. The outcomes 
are typical solutions, and we cannot perform statistics over simulations since the many solutions 
are qualitatively different. However, the outcome of each scenario is presented together with a 
description of alternative outcomes, giving indicators of achievement of local/global optimum. 
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4.1 Well-mixed population 


In the first scenario, each agent 6 ^ perceives the output of every other possible agent 9j with the 
same probability, that is p{di,6j) = 1/25^ for every i,j £ [1,25]. The maximum average code 
similarity is bounded by I{Yq;Yq') = 1.71908 bits, which is achieved under two conditions: first, 
every code must be a one-to-one mapping; second, the code must be universal. This is indeed the 
outcome of the performed optimisation, as we show in Fig. the optimised codes (blue points) 
converged into a universal code (the distance between any of them is zero). Each red (diamond) 
point correspond to an initial code. 


♦ initial codes 

• final codes 




♦♦♦ 
♦ ♦♦ 

♦ 


Figure 5: 2-dimensional plot of code distance: red points are codes at the beginning of the optimisation 
process; blue points are codes at the end of the optimisation process (where the distance between every 
pair of codes is zero). 


The resulting code adopted by the population is a one-to-one mapping between sensor states 
and outputs, and any of the 24 possible one-to-one mappings is a global maximum (there are 
4 sensor states and 4 possible outputs). However, it is still interesting to briefly analyse the 
possible paths towards a universal and optimal code. In Fig. we show the distribution of the 
adopted codes by the agents of the population in an iteration of the optimisation process where 
the average code similarity is I{Xq\Xqi) = 1.18276 bits. Here, the most popular code is the 
suboptimal code shown in Fig. (a). This results from the particular initialised codes, driving 
the agents temporarily towards a suboptimal code. However, once any of the many-to-one codes 
becomes (nearly) universally adopted, then any code’s deviation improving the code similarity will 
eventually drive the convention towards optimality. The fact that it does not need simultaneous 
changes in the code increases the likeliness of improving the code similarity. 
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Figure 6: Representation of the codes p{x\y) by a heat-map nsing inverse grayscale. For each evolved 
code, we ontput the number of agents adopting it. This code distribution was achieved with 25 agents in 
a well-mixed population. 




(a) 20 


(b) 1 


(c)2 


(d)l 


(e)l 


4.2 Spatially-structured population 


In another set-up, we assume the agents are structured in a 5 x 5 grid, where p{0,9') = 1/105 
if 9 and 9' are neighbours or when 9 = 9' (see Fig. for a representation of the structure). 
After randomly initialising the codes, the performed optimisation plateaued on an average code 
similarity of I{Xq\Xqi) = 1.13536 bits. As in the former scenario, here the optimal solution is 
also a universal code with a one-to-one mapping. However, in this case, the result is not a universal 
code, as can be appreciated in Fig. Spatially structured populations are sensitive to the initial 
codes and how codes are updated. 
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Figure 7: 2-dimensional plot of code distance: 
points in diamond shape represent codes at the be¬ 
ginning of the optimisation process; rounded points 
represent codes at the end of the optimisation pro¬ 
cess. The points are coloured in order to be able to 
relate this plot with the figure beside it. 


Figure 8: Representation of the spatial structure 
utilised for the experiment. Agents are assumed to 
be distributed in a grid: an edge from one agent to 
another means that one agent perceives the output 
of the other. Agents are labelled (see Fig. and 
coloured according to their adopted code. 


The resulting code distribution among the population is shown in Fig. with 8 different codes 
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in the population. Where well-mixed populations evolved the use of common codes, agreement 
on codes only occurred among neighbours in spatially structured populations. As a consequence, 
many local conventions are established within neighbourhoods, and, once this situation is reached, 
the improvement of the total code similarity requires simultaneous changes to the agent’s codes. 
For instance, the code shown in Fig. e) could increase the average similarity of the population if 
Pix 2 \yi) = 1, as it is in the rest of the codes. However, for this to happen (in this particular case), 
at least two agents need to change their code simultaneously (otherwise the average similarity 
decreases), which makes the deviation from the resulting code distribution unlikely. 

yi 
» 
ys 
yi 


Figure 9: Representation of the codes p{x\y) by a heat-map using inverse grey scale. For each evolved 
code, we ontput the number of agents adopting it. This code distribution was achieved with 25 agents in 
a grid structure. 
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4.3 Flexible population structure 

For the third scenario, we let the structure co-evolve with the codes without any constraint (the 
probability distribution of the interaction between agents, p(S), is optimised together with the 
codes). In this case, the resulting average code similarity is nearly optimal, but the code is 
not necessarily universal. This is because, when the structure is not fixed, agents form roughly 
disconnected clusters of related codes. In this process, the interaction probability of agents with 
unrelated codes will vanish. However, once the clusters are formed, if it is not a single isolated 
agent (such that no other agent perceives its output), then codes of agents are universal within 
each cluster. This is exemplified by the code distribution and population structure we obtained 


(see Fig. 10). Here, we have two clusters with universal codes, one optimal (in red) and the other 
suboptimal (in yellow). Agents with dissimilar codes from every other agent they interact with 
will become isolated in the optimisation process, as the example shows for two agents (light and 
dark blue). 

To summarise, the optimal code similarity equals I{Yq] Yqi), and is achieved, for instance, when 
all agents adopt the same one-to-one mapping. Nevertheless, the interaction probability allows 
agents to form disconnected clusters of related codes, where several one-to-one mappings could 
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result while still achieving optimality. Theoretically, we could have as many one-to-one mappings 
as the minimum between the amount of agents and the total one-to-one mapping combinations 
(24 in this case). 



Figure 10: Each node in the graph corresponds to the code of an agent. There is a weighted edge between 
agent 6i and 6j if p{0i,9j) > 0 (which is the weight). We omit weights of edges in the graph since they 
all are roughly of similar value. The temperature colours on top of the nodes indicate the amount of 
environmental information they would contribute to any agent perceiving only that agents output. 


4.4 Emerging concepts in a well-mixed heterogeneous population 

So far, we have only considered populations of agents that acquired the same aspects of information 
from /r (ie., p{Ye-\p.) = p{Yg^\fi) for any pair of agents 0^, Oj). The assumption was that the 
information that was relevant for the survival of the agents was the same among the agents 
of the population, and this was represented by p. Now, we consider a more general scenario, 
where different types of agents acquire different aspects from the environmental conditions p. We 
investigate whether it is possible for an agent that does not directly perceive the environment at 
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all (we call this type of agent “blind”) to predict conditions based solely on the outputs of other 
agents. We consider a well-mixed population, such that different types of agents are forced to 
talk to each other. Considerations with a flexible population structure are not interesting for our 
purposes, since in these cases, each type of agent forms a cluster disconnected from clusters of 
other types. This was confirmed by simulations which are not shown here. 

Let us illustrate the idea with a relatively simple scenario: we consider five types of agents (we 
denote the \-th type (pi), where each type can only distinguish whether the current state of the 
environment belongs to its coloured region or not. The environment consists of 9 states, and the 
probability of each state is uniformly distributed. We illustrate this environment by a 3 x 3 grid, as 
shown in Fig. [m although the square does not denote the physical structure of the environment. 
Then, the outputs of each type of agent will be related to the regions they capture. For instance, 
for agents of type p 2 with the same deterministic code, if Pr(/x e {1,2,4, 5}|Xe = x) equals one 
(for all 9 of type p 2 ), then x will signify that this agent is currently in the region coloured in 
red in Fig. El We say that a population of agents has a joint concept of the environment if 
by considering its representation of the environmental information they capture, we can obtain 
information about the environment, i.e. we require that I{p;Xq) > 0. For instance, the symbol 
X in the example above, assuming that it is only utilised by agents of the same type, can be 
understood as representing the concept “top-left” of the grid. 



Figure 11: Representation of the conditional probabilities p{Ye\p) for an agent 0 of each type. These 
are defined such that each type of agent can only distinguish between the coloured region and the white 
region. For instance, the sensor of type (f >2 is dehned as Pr{Y = yi\p) = 1 if p € { 1 , 2 , 4 , 5}, and zero 
otherwise, and Pr{Y = y 2 \p) = 1 if p ^ { 1 , 2 , 4 , 5}, and zero otherwise. For type p\, Pr{Y = yi\y) ~ 0.5 
and Pr{Y = y 2 \y) = 0.5 (|y| = 2 for all types of agents). 

The amount of environmental information that an agent 9 of type pi (a blind agent) captures 
is I{fj,-,Yg) = 0 bits, while all agents 9 of the other types capture I{pL-,Yg) = 0.991076 bits (note 
that the total entropy in p to be resolved is H{fi) = 3.16993 bits). Throughout this study, we 
considered that agents predict the environment by considering their perceptions together with the 
outputs of other agents. The blind agent, instead, since it is not able to capture any direct cue 
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from /X, we consider capable of perceiving the outputs of both of the agents selected by 0 and 
0b With this relaxed consideration, we say a blind agent has a concept of the environment if 
I{p] Xq, Xqi) > 0, i.e. we consider the maximum amount of information an agent can possibly 
infer from the joint outputs Xq and Xqi. 

Let us recall that the structure of the population is well-mixed, and thus the distribution of 
outputs of all agents is considered, including the blind ones, which are not able to express (via 
their outputs) any particular concept by themselves (for a blind agent 0, I{p;Xg) < I{p]Yg) = 0, 
i.e. I{pl',Xq) vanishes). Therefore, whether a blind agent has some concept of the environment 
will depend, first, on the universality of the codes of each type of agent (agents representing the 
same information with different symbols may create ambiguities). Second, on the cardinality of 
the alphabet of X {i.e. |X|) utilised by the population. A small alphabet will force agents to 
represent different concepts of the environment with the same symbols, while a large alphabet is 
likely to result in exclusive representations of concepts for each type of agent. 

Taking this into account, we ask, is it possible for a blind agent to identify concepts of the 
environment? If so, how are these concepts related to the concepts of the individual agents (other 
than the blind ones)? Is the size of the available alphabet related to the quality of the concepts? 

To study these questions, we performed different experiments varying the size of the alphabet 
|A|, where the rest of the parameters remained the same. In these experiments, we optimised the 
similarity of codes for a population composed of 20 agents, with 4 agents of each of the five types. 
In Table we show that the cardinality of the alphabet of X affects the limit of the amount of 
information a blind agent can possibly infer about the environment. 

Now, if we measure the uncertainty of the environment for a blind agent for each combination 
of outputs Xq and Xqi, we find that for some of them, it is zero. For instance, with |A| = 7, we 


found that when Pr{pL = 5|Xe = l,Ae/ = 2) = I.O (see Fig. 12 where only combinations with 
Xq < Xq! are shown). These distributions are also valid when swapping the values of Xq and 
Xqi, since in the well-mixed population the structure is symmetric. Looking at the example of 
the conditional probability in Fig. |12[ we can find many other concepts, although none of them 
—apart from the one already discussed— can uniquely identify a state of the environment. For 
instance, we have that Pr(/x|Xe = 3, Xqi = 6 ) = 0.33 when pi G {3,5,7}, which is a concept for 
being on a particular diagonal of the environment. 


In Fig. 13 we show the resulting codes (which are universal for each type, including the blind 
one) for this particular experiment. Here, the types (f >2 (red) and ^5 (purple) utilise the same 


15 




1^1 f^2 /^5 W)/i7MS/^9 


|X| I{fi;Xe,Xe') 

2 0.34621 

3 0.56555 

4 0.71620 

5 0.95467 

6 1.08139 

7 1.18362 

8 1.30919 

9 1.30919 



Table 1: Results of experiments where the size of the alpha¬ 
bet of a population varies. The maximum amount of environ¬ 
mental information that a blind agent can infer is achieved 
with |XI = 8 and remains equal for bigger alphabets. As the 
size of the alphabet decreases, this information also decreases. 


Figure 12: Conditional probabil¬ 

ity p{n\X 0 ,Xs') in inverse grey-scale. 
Each row represents a combination of 
values of Xq and Xq/ , and each column 
represents a state of p. 


symbols to represent different environmental conditions. By using a small size of the alphabet for 
X, we force ambiguities in the population, but these will be chosen (by evolution) such that they 
are minimal. In this way, we maximise the amount of information we can infer from the outputs 
(although this can be a local optimum). For instance, the outputs of the blind agents (type ^i) 
for all the experiments never overlapped that of other types (unless we use |Ai| = 2, where there 
is no choice). In other words, blind agents always choose one symbol so that they minimise the 
amount of utilised symbols from the whole population. 






FFUTffl 


Figure 13: Representation of codes p(Xe|ye, 0) by a heat-map using inverse grayscale for the experiment 
with |A| = 7. For each node, the rows represent a sensor state y, while the columns represent an output 
state X. The colours on top of the nodes are used to distinguish the type of agent to whom the code 
belongs, and colours are related to those shown in Fig. fTTl 


In all the performed experiments, we found that for values of |Ar| > 6, the blind agent can 
perfectly predict the environmental state /i = 5 for at least one combination of outputs Xq and 
Xqi . Interestingly, this new concept, which in this particular experiment can be called the “centre” 
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of the world or environment, cannot be obtained by looking to individual concepts only. 

5 Discussion 

We considered four different scenarios of code evolution: in the first one, all agents perceived the 
outputs of all other agents, including itself. We argued that two main stages of evolution can be 
recognised: in the first stage, a universal code is established, which can be optimal or not. If it 
is not optimal, then a second stage will achieve optimality. The same result was obtained in |34j . 
in a model of the evolution of the genetic code (represented as a probabilistic mapping between 
codons and amino acids), although universality and optimality were simultaneously achieved. 

In the mentioned work, which developed further the ideas of [3S1 [33] , the authors argue that 
the universality of the genetic code is a consequence of early communal evolution, mediated by 
horizontal gene transfer (HGT) between primitive cells. In this evolutionary process, they argue, 
larger communities will have access (through the exchange of genetic material) to more innovations, 
leading to faster evolution than smaller ones. Then, “it is not better genetic codes that give an 
advantage but more common ones” [53|. Although their model does not explicitly show this 
property, it is captured in our model. We show that a more common, but not optimal code is 
widely adopted within a population (see Fig. [^. However, in our model, a code imposes itself 
as universal not because it provides access to more innovations (in our model there is no “code 
exchange”, only the outputs are shared), but because the population structure forces the adoption 
of the most popular code. After this stage, further changes in the code of the agents eventually 
lead to optimality. 

In another related work, [2T| explored the origins of language in a scenario consisting of artificial 
agents with a coupled perception and production of speech sounds. Although this work is focused 
on plausible mechanisms for the origin of language, it assumes the same similarity principle as we 
do (hearing a vocalisation increases the probability of producing similar vocalisations), arriving to 
the same outcome (a universal language, or code). Other works have considered similar principles 
in the evolution of languages: for instance, the naming game |32j and the imitation game [5]. 
However, these models assume some common conventions in order to evolve new ones. In this 
study, our main assumption was that the population of agents depended on common environmental 
conditions. 

Our second scenario, where the structure of the population is a grid, showed how establishing 
local conventions in early stages of evolution constrains the outcome of the code distribution. 
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since to reconcile different conventions, several simultaneous changes are needed. On the other 
hand, in our third scenario, where we let the structure of the population change simultaneously 
with the codes themselves, such situations are avoided by “disconnecting” clusters with dissimilar 
conventions. This property enhances evolution, and can potentially lead to the adoption of several 
different conventions within an increasingly fragmenting, or “speciating” population. 

Our last scenario assumed perceptual constraints on the environmental information of each 
agent, an we looked at emerging concepts within a well-mixed population. This scenario was 
studied in m. where, as well as in our study, new conceptualisations of the world emerged as 
a result of considering together the concepts of every agent. In both studies, the new concept 
was not representable individually by any agent. Differently from the mentioned study, the new 
concepts obtained in our study were the result of a simple similarity maximisation principle, while 
in the work of [20) . concepts were obtained through the modelling of an explicit fitness function. 

The evolution of conventional codes could be interpreted, in the widest sense, as a form of 
cultural evolution. For instance, considering the definition of culture given by |25j : “Culture is 
information capable of affecting individuals ’ behavior that they acquire from other members of their 
species through teaching, imitation, and other forms of social transmission. ”, it could be argued 
that a form of cultural information is present in organisms, such as bacteria or plants. Although 
there is a dependence among the different dimensions on which information is transmitted in 
organisms (if we assume the dimensions to be, for instance, genetic, epigenetic, behavioural and 
symbol-based, as proposed by m), our model assumes freedom of choice in one dimension, without 
direct influence on the others. 

Finally, communication between individuals of a population opens up the possibility of “signal 
cheaters”, which could be either individuals that do not produce signals themselves but still 
perceive those of the others, or individuals who exploit other individual’s learned responses to 
symbols to their advantage. However, our model does not allow such behaviour, since the code 
producing the outputs functions, implicitly, as the interpreter of the perceived signals. 

6 Conclusion 

In the proposed model, we introduced a key assumption which allowed us to evolve, for some 
structures, universal and optimal codes. This assumption states that an agent cannot distinguish 
the sources of the outputs it perceives from other agents. Following from this, a universal code will 
necessary introduce semantics by relating symbols to environmental conditions (via the internal 
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states of the agent). Our model proposes an information-theoretic way of measuring the similarity 
within a population of codes. 

In this work, we proposed, as an evolutionary principle, that agents try to maximise their side 
information about the environment indirectly by maximising their mutual code similarity. This be¬ 
haviour produces several interesting outcomes in the code distribution of a structured population. 
Depending on the population structure, it captures the evolution of a universal and optimal code 
(well-mixed population structure), while also the evolution of different codes organised in clusters 
(in a freely evolving structure), which allows the establishment of optimal as well as suboptimal 
conventions. 

Finally, we considered a well-mixed heterogeneous population with perceptual constraints on 
the agents about the environment, and showed how, just by looking at the outputs of agents, it 
is possible to extract concepts that relate to the environment, concepts that none of the agents of 
the population could individually represent. 
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